Search CORE

19 research outputs found

The space of ultrametric phylogenetic trees

Author: Drummond Alexei J.
Gavryushkin Alex
Publication venue: 'Elsevier BV'
Publication date: 08/06/2016
Field of study

The reliability of a phylogenetic inference method from genomic sequence data is ensured by its statistical consistency. Bayesian inference methods produce a sample of phylogenetic trees from the posterior distribution given sequence data. Hence the question of statistical consistency of such methods is equivalent to the consistency of the summary of the sample. More generally, statistical consistency is ensured by the tree space used to analyse the sample. In this paper, we consider two standard parameterisations of phylogenetic time-trees used in evolutionary models: inter-coalescent interval lengths and absolute times of divergence events. For each of these parameterisations we introduce a natural metric space on ultrametric phylogenetic trees. We compare the introduced spaces with existing models of tree space and formulate several formal requirements that a metric space on phylogenetic trees must possess in order to be a satisfactory space for statistical analysis, and justify them. We show that only a few known constructions of the space of phylogenetic trees satisfy these requirements. However, our results suggest that these basic requirements are not enough to distinguish between the two metric spaces we introduce and that the choice between metric spaces requires additional properties to be considered. Particularly, that the summary tree minimising the square distance to the trees from the sample might be different for different parameterisations. This suggests that further fundamental insight is needed into the problem of statistical consistency of phylogenetic inference methods.Comment: Minor changes. This version has been published in JTB. 27 pages, 9 figure

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Accounting for errors in data improves divergence time estimates in single-cell cancer evolution

Author: Chen K
Drummond AJ
Gavryushkin Alex
Moravec JÍC
Welch D
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/06/2022
Field of study

Single-cell sequencing provides a new way to explore the evolutionary history of cells. Compared to traditional bulk sequencing, where a population of heterogeneous cells is pooled to form a single observation, single-cell sequencing isolates and amplifies genetic material from individual cells, thereby preserving the information about the origin of the sequences. However, single-cell data is more error-prone than bulk sequencing data due to the limited genomic material available per cell. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. Our simulations show that modeling errors increase the accuracy of relative divergence times and substitution parameters. We reconstruct the phylogenetic history of a colorectal cancer patient and a healthy patient from single-cell DNA sequencing data. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. We observed that not accounting for errors can overestimate the phylogenetic diversity in single-cell DNA sequencing data. We estimate that 30-50% of the apparent diversity can be attributed to error. Our work enables a full Bayesian approach capable of accounting for errors in the data within the integrative Bayesian software framework BEAST2

UC Research Repository

Inferring genetic interactions from comparative fitness data

Author: Beerenwinkel Niko
Crona Kristina
Gavryushkin Alex
Greene Devin
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2017
Field of study

Darwinian fitness is a central concept in evolutionary biology. In practice, however, it is hardly possible to measure fitness for all genotypes in a natural population. Here, we present quantitative tools to make inferences about epistatic gene interactions when the fitness landscape is only incompletely determined due to imprecise measurements or missing observations. We demonstrate that genetic interactions can often be inferred from fitness rank orders, where all genotypes are ordered according to fitness, and even from partial fitness orders. We provide a complete characterization of rank orders that imply higher order epistasis. Our theory applies to all common types of gene interactions and facilitates comprehensive investigations of diverse genetic interactions. We analyzed various genetic systems comprising HIV-1, the malaria-causing parasite Plasmodium vivax, the fungus Aspergillus niger, and the TEM-family of β-lactamase associated with antibiotic resistance. For all systems, our approach revealed higher order interactions among mutations.ISSN:2050-084

Repository for Publications and Research Data

The combinatorics of discrete time-trees: theory and open problems

Author: Gavryushkin Alex
Matsen Frederick A.
Whidden Chris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2018
Field of study

A time-tree is a rooted phylogenetic tree such that all internal nodes are equipped with absolute divergence dates and all leaf nodes are equipped with sampling dates. Such time-trees have become a central object of study in phylogenetics but little is known about the parameter space of such objects. Here we introduce and study a hierarchy of discrete approximations of the space of time-trees from the graph-theoretic and algorithmic point of view. One of the basic and widely used phylogenetic graphs, the NNI graph, is the roughest approximation and bottom level of our hierarchy. More refined approximations discretize the relative timing of evolutionary divergence and sampling dates. We study basic graph-theoretic questions for these graphs, including the size of neighborhoods, diameter upper and lower bounds, and the problem of finding shortest paths. We settle many of these questions by extending the concept of graph grammars introduced by Sleator, Tarjan, and Thurston to our graphs. Although time values greatly increase the number of possible trees, we show that 1-neighborhood sizes remain linear, allowing for efficient local exploration and construction of these graphs. We also obtain upper bounds on the r-neighborhood sizes of these graphs, including a smaller bound than was previously known for NNI. Our results open up a number of possible directions for theoretical investigation of graph-theoretic and algorithmic properties of the time-tree graphs. We discuss the directions that are most valuable for phylogenetic applications and give a list of prominent open problems for those applications. In particular, we conjecture that the split theorem applies to shortest paths in time-tree graphs, a property not shared in the general NNI case.ISSN:1432-1416ISSN:0303-681

Repository for Publications and Research Data